This vignette outlines the steps of Integration, clustering, differential analysis, functional annotation and visualization of intracellular pathogens at single-cell level using PathogenTrackExplorer. PathogenTrackExplorer requires Gene Expression Matrix (produced by cellranger) and Pathogen Abundance Matrix (produced by PathogenTrack) as the user input and explores intracellular pathogens by integration, clustering, differential analysis, functional annotation and visualization.

Load the required packages

library(Seurat)
library(Matrix)
library(dplyr)
library(ggplot2)
library(reshape2)
library(cowplot)
library(harmony)
library(ggthemes)
library(ggrepel)
library(topGO)
library(org.Hs.eg.db)
library(PathogenTrackExplorer)

Data Input & Build a Integrated Seurat Object

PathogenTrackExplorer requires Gene Expression Matrix (produced by cellranger) and Pathogen Abundance Matrix (produced by PathogenTrack). PathogenTrackExplorer integrated these matrix automatically, and build an integrated Seurat object. This step may take several hours to complete.

#SeuratObject <- vIntegrate(x="/home/scrna/Lung/Pathogen-Track/SampleInfo.txt", min.nFeature=500, max.nFeature=4000, percent.mito=10, min.nCount_RNA=1000, max.nCount_RNA = Inf, nVariable=2000, nPC=30, res=0.5)
#SeuratObject <- subset(SeuratObject, group=="NSCLC")
#saveRDS(SeuratObject, file = "/home/scrna/NSCLC_SeuratObject.rds")
SeuratObject <- readRDS("/home/scrna/NSCLC_SeuratObject.rds")

Finding cluster biomarkers

We use Seurat::FindAllMarkers to find cluster biomarkers. Only ‘MAST’ test method is used in PathogenTrackExplorer.

SeuratObject@misc$DE <- Seurat::FindAllMarkers(object = SeuratObject, assay = 'RNA', only.pos = TRUE, test.use = 'MAST')

Function annotation of each cluster

We employ topGO to perform enrichment analysis.

SeuratObject@misc$GO <- xGO(object = SeuratObject, key = "DE", clusters = NULL)

Visualize cells and clusters in Dimensional Reduction Plot

We use Seurat::DimPlot to show cells and clusters in UMAP graph. Each dot represents a cell, and each color represents the identity class.

DimPlot(SeuratObject, cols = SeuratObject@misc$cols)

Visualize cell population fractions in samples

xPopulationPlot function is built to visualize cell population fraction in samples or clusters.

xPopulationPlot(SeuratObject, by="sample", cols=SeuratObject@misc$cols)

Visualize average expression level in DotPlot

The xDotPlot is a wrapper for Seurat::DotPlot. The size of the dot stands for the percentage of cells expressing the feature in each cluster. The color represents the average expression level. Here we use it to show the top 3 high expressed features in each cluster.

SeuratObject@misc$DE %>% group_by(cluster) %>% top_n(n = 3, wt = avg_logFC) -> top3
xDotPlot(SeuratObject, features = top3$gene)

Visualize feature expression in FeaturePlot

The xFeaturePlot is a wrapper for Seurat::FeaturePlot, and it accepts one or more features to draw a patchworked figure automatically.

markers = c('CD3D', 'MS4A1', 'SDC1', 'MZB1', 'TPPP3', 'KRT18', 'CD68', 'CD163', 'FCN1', 'FCGR3B', 'HBA1', 'KLRD1', 'FCER1A', 'CD1C', 'CLEC9A', 'LILRA4', 'TPSB2', 'CD34', 'FUT4', 'LYZ')
xFeaturePlot(object=SeuratObject, features=markers, font.size = 8)

#png(file="SeuratObject_markers_plot.png", width = dpi*12, height = dpi*8, units = "px",res = dpi,type='cairo')
#xFeaturePlot(object=SeuratObject, features=markers, font.size = 8)
#dev.off()

Visualize GO enrichment of specific cluster

Here, we use xGOBarPlot to show GO enchriment of specific cluster.

xGOBarPlot(SeuratObject, key = "GO", cluster = "9", ont = "BP", top_n = 10)

Visualize GO enrichment of all clusters

xGOBoxPlot is used to show GO enrichment of all clusters.

xGOBoxPlot(SeuratObject, key = "GO", ont = "BP", top_n = 3)

Also, we can use xGODotPlot to show GO enrichment of all clusters.

xGODotPlot(SeuratObject, key = "GO", ont = "BP", top_n = 3)

Visualize pathogen abundance in FeaturePlot

Top 25 abundant pathogens are used to show in xFeaturePlot.

species <- c("Pseudomonas tolaasii", "Xanthomonas euvesicatoria pv. alfalfae CFBP 3836", "Streptomyces sp. GY16", "Clostridium botulinum", "Wenyingzhuangia fucanilytica", "Bacillus megaterium", "Klebsiella oxytoca", "Microcystis aeruginosa FD4", "Porphyrobacter neustonensis", "Acinetobacter sp. WCHAc010034", "Hathewaya histolytica", "Candidatus Desulforudis audaxviator MP104C", "Pasteurella multocida subsp. multocida", "Mycoplasma wenyonii str. Massachusetts", "Halanaerobium hydrogeniformans", "Pseudomonas sp. R32", "Staphylococcus aureus", "Oceanisphaera profunda", "Flavobacterium alkalisoli", "Staphylococcus epidermidis", "Nostoc sp. NIES-4103", "Escherichia coli", "Labilibaculum antarcticum", "Anaerohalosphaera lusitana", "Bradymonadales bacterium V1718", "Mucilaginibacter sp. HYN0043", "Streptomyces collinus Tu 365", "Rahnella sp. ERMR1:05", "Intrasporangium calvum DSM 43043", "Treponema brennaborense DSM 12168", "Helicobacter bilis", "Parabacteroides distasonis ATCC 8503", "Leptotrichia shahii", "Staphylococcus hominis", "Gemella sanguinis", "Acidihalobacter prosperus", "Pseudomonas cedrina", "Spiroplasma sp. TIUS-1", "Mycoplasma hyorhinis", "Brevibacillus laterosporus", "Pasteurella multocida", "Corynebacterium minutissimum", "Methanobrevibacter smithii", "Allochromatium vinosum DSM 180", "Actinoplanes sp. N902-109", "Halomonas sp. N3-2A", "Clostridium cellulovorans 743B", "Sphaerospermopsis kisseleviana NIES-73", "Campylobacter jejuni", "Saccharomonospora glauca K62")
species <- intersect(species, rownames(SeuratObject)) %>% head(25)
xFeaturePlot(object=SeuratObject, features=species, font.size = 8)

#png(file="SeuratObject_species_plot.png", width = dpi*16, height = dpi*12, units = "px",res = dpi,type='cairo')
#xFeaturePlot(object=SeuratObject, features=species, font.size = 8, order = TRUE)
#dev.off()

Visualize pathogen abundance in FeaturePlot

Here, the abundance of “Streptomyces sp. GY16” is shown in xFeaturePlot.

xFeaturePlot(object=SeuratObject, features=c("Streptomyces sp. GY16"), font.size = 8, order = TRUE)

Differential analysis between Pathogen infected(+) and the bystanders(-)

The vDE is a wrapper of Seurat::FindMarkers, which is employed to do differential analysis between pathogen infected and pathogen negative cells of each cluster. Here we use vDE to find differential genes between “Streptomyces sp. GY16” positive and “Streptomyces sp. GY16” negative cells in cluster “7” and cluster “8”.

SeuratObject@misc$vDE <- vDE(SeuratObject, features.by=c("Streptomyces sp. GY16"), min.cells = 20, clusters = as.character(7:8))
## ### Analysis Cluster-7 ...
## ====== Analysis feature Streptomyces sp. GY16 ... done.
## ### Analysis Cluster-8 ...
## ====== Analysis feature Streptomyces sp. GY16 ... done.

Heatmap shows differentially expressed features between pathogen(+) and pathogen(-) cells

The vHeatMap is a warpper of Seurat::DoHeatmap, which is employed to visualize pathogen infected(+) and pathogen negative(-) cells of each cluster in heatmap. The heatmap of differential genes in cluster “7” and “8” are shown.

vHeatMap(object = SeuratObject, key = "vDE", cluster = "7", feature.by = "Streptomyces sp. GY16")

vHeatMap(object = SeuratObject, key = "vDE", cluster = "8", feature.by = "Streptomyces sp. GY16")

VolcanoPlot shows differentially expressed features between pathogen(+) and pathogen(-) cells

The vVolcanoPlot enables quick visual identification of significant features with small pvalue and large fold changes. The volcano plot of differential genes in cluster “7” and “8” are shown.

vVolcanoPlot(object = SeuratObject, key = "vDE", feature.by = "Streptomyces sp. GY16", cluster = "7", top_n = NULL)

vVolcanoPlot(object = SeuratObject, key = "vDE", feature.by = "Streptomyces sp. GY16", cluster = "8", top_n = NULL)

GO enrichment of differentially expressed features between pathogen(+) and pathogen(-) cells of each cluster

We employ topGO to perform enrichment analysis of differentially expressed features between pathogen(+) and pathogen(-) cells of each cluster. Here we perform go enrichment of “Streptomyces sp. GY16” positive and negative in cluster “7” as an example.

SeuratObject@misc$vGO <- vGO(SeuratObject, key = "vDE", cluster = "7", feature.by = "Streptomyces sp. GY16")
## [1] "Running Streptomyces sp. GY16 in Cluster-7 ... "
## [1] "done."

Visualize GO enrichment using vGOBarPlot or vGODotPlot

Here we use vGOBarPlot to show the go enrichment of “Streptomyces sp. GY16” positive and negative in cluster “7” as an example.

vGOBarPlot(SeuratObject, key="vGO", cluster="7", feature="Streptomyces sp. GY16", ont="BP", top_n=20)

Then we use vGODotPlot to show the go enrichment of “Streptomyces sp. GY16” positive and negative in cluster “7” as an example. The color represents the significance, and the dot size represents for gene numbers that enriched in the corresponding pathway.

vGODotPlot(SeuratObject, key="vGO", cluster="7", feature="Streptomyces sp. GY16", ont="BP", top_n=20)

Here we perform go enrichment of “Streptomyces sp. GY16” positive and negative in cluster “8” as an example.

SeuratObject@misc$vGO <- vGO(SeuratObject, key = "vDE", cluster = "8", feature.by = "Streptomyces sp. GY16")
## [1] "Running Streptomyces sp. GY16 in Cluster-8 ... "
## [1] "done."

Then we use vGODotPlot to show the go enrichment of “Streptomyces sp. GY16” positive and negative in cluster “8” as an example.

vGOBarPlot(SeuratObject, key="vGO", cluster="8", feature="Streptomyces sp. GY16", ont="BP", top_n=20)

Then we use vGODotPlot to show the go enrichment of “Streptomyces sp. GY16” positive and negative in cluster “8” as an example. The color represents the significance, and the dot size represents for gene numbers that enriched in the corresponding pathway.

vGODotPlot(SeuratObject, key="vGO", cluster="8", feature="Streptomyces sp. GY16", ont="BP", top_n=20)

sessionInfo()
## R version 3.6.0 (2019-04-26)
## Platform: x86_64-redhat-linux-gnu (64-bit)
## Running under: CentOS Linux 7 (Core)
## 
## Matrix products: default
## BLAS/LAPACK: /usr/lib64/R/lib/libRblas.so
## 
## locale:
##  [1] LC_CTYPE=zh_CN.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=zh_CN.UTF-8        LC_COLLATE=zh_CN.UTF-8    
##  [5] LC_MONETARY=zh_CN.UTF-8    LC_MESSAGES=zh_CN.UTF-8   
##  [7] LC_PAPER=zh_CN.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=zh_CN.UTF-8 LC_IDENTIFICATION=C       
## 
## attached base packages:
## [1] stats4    parallel  stats     graphics  grDevices utils     datasets 
## [8] methods   base     
## 
## other attached packages:
##  [1] PathogenTrackExplorer_0.1.0 org.Hs.eg.db_3.10.0        
##  [3] topGO_2.38.1                SparseM_1.78               
##  [5] GO.db_3.10.0                AnnotationDbi_1.48.0       
##  [7] IRanges_2.20.2              S4Vectors_0.24.4           
##  [9] Biobase_2.46.0              graph_1.64.0               
## [11] BiocGenerics_0.32.0         ggrepel_0.8.2              
## [13] ggthemes_4.2.0              harmony_1.0                
## [15] Rcpp_1.0.5                  cowplot_1.0.0              
## [17] reshape2_1.4.4              ggplot2_3.3.2              
## [19] dplyr_1.0.0                 Matrix_1.2-17              
## [21] Seurat_3.1.5               
## 
## loaded via a namespace (and not attached):
##   [1] Rtsne_0.15                  colorspace_1.4-1           
##   [3] ellipsis_0.3.1              ggridges_0.5.2             
##   [5] XVector_0.26.0              GenomicRanges_1.38.0       
##   [7] leiden_0.3.3                listenv_0.8.0              
##   [9] farver_2.0.3                bit64_0.9-7                
##  [11] codetools_0.2-16            splines_3.6.0              
##  [13] knitr_1.29                  jsonlite_1.7.1             
##  [15] ica_1.0-2                   cluster_2.0.8              
##  [17] png_0.1-7                   uwot_0.1.8                 
##  [19] sctransform_0.2.1           compiler_3.6.0             
##  [21] httr_1.4.2                  lazyeval_0.2.2             
##  [23] prettyunits_1.1.1           htmltools_0.5.0            
##  [25] tools_3.6.0                 rsvd_1.0.3                 
##  [27] igraph_1.2.5                gtable_0.3.0               
##  [29] glue_1.4.1                  GenomeInfoDbData_1.2.2     
##  [31] RANN_2.6.1                  rappdirs_0.3.1             
##  [33] vctrs_0.3.1                 ape_5.4                    
##  [35] nlme_3.1-139                lmtest_0.9-37              
##  [37] xfun_0.15                   stringr_1.4.0              
##  [39] globals_0.12.5              lifecycle_0.2.0            
##  [41] irlba_2.3.3                 future_1.18.0              
##  [43] zlibbioc_1.32.0             MASS_7.3-51.4              
##  [45] zoo_1.8-8                   scales_1.1.1               
##  [47] MAST_1.12.0                 hms_0.5.3                  
##  [49] SummarizedExperiment_1.16.1 RColorBrewer_1.1-2         
##  [51] SingleCellExperiment_1.8.0  yaml_2.2.1                 
##  [53] memoise_1.1.0               reticulate_1.16            
##  [55] pbapply_1.4-2               gridExtra_2.3              
##  [57] stringi_1.4.6               RSQLite_2.2.0              
##  [59] BiocParallel_1.20.1         GenomeInfoDb_1.22.1        
##  [61] rlang_0.4.7                 pkgconfig_2.0.3            
##  [63] matrixStats_0.56.0          bitops_1.0-6               
##  [65] evaluate_0.14               lattice_0.20-38            
##  [67] ROCR_1.0-11                 purrr_0.3.4                
##  [69] patchwork_1.0.1             htmlwidgets_1.5.1          
##  [71] labeling_0.3                bit_1.1-15.2               
##  [73] tidyselect_1.1.0            RcppAnnoy_0.0.16           
##  [75] plyr_1.8.6                  magrittr_1.5               
##  [77] R6_2.4.1                    generics_0.0.2             
##  [79] DelayedArray_0.12.3         DBI_1.1.0                  
##  [81] pillar_1.4.6                withr_2.2.0                
##  [83] fitdistrplus_1.1-1          abind_1.4-5                
##  [85] survival_3.2-3              RCurl_1.98-1.2             
##  [87] tibble_3.0.3                future.apply_1.6.0         
##  [89] tsne_0.1-3                  crayon_1.3.4               
##  [91] KernSmooth_2.23-15          plotly_4.9.2.1             
##  [93] rmarkdown_2.3               progress_1.2.2             
##  [95] grid_3.6.0                  data.table_1.12.8          
##  [97] blob_1.2.1                  digest_0.6.25              
##  [99] tidyr_1.1.0                 munsell_0.5.0              
## [101] viridisLite_0.3.0